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Abstract 



1.1 FIRST AND FOLLOW 



This paper describes an algorithm for the com- 
putation of FIRST and FOLLOW sets for use 
with feature-theoretic grammars, in which the 
value of the sets consists of pairs of feature- 
theoretic categories. The algorithm preserves 
as much information from the grammars as 
possible, using negative restriction to define 
equivalence classes. Addition of a simple data 
structure leads to an order of magnitude im- 
provement in execution time over a naive im- 
plementation. 



1 Introduction 

The need for efficient parsing is a constant one 
in Natural Language Processing. With the ad- 
vent of feature-theoretic grammars, many of 
the optimization techniques that were applica- 
ble to Context Free (CF) grammars have re- 
quired modification. For instance, a number 
of algorithms used to extract parsing tables 
from CF grammars have involved discarding 
information which otherwise would have con- 
strained the parsing process, Briscoe and Car- 
roll (1993). This paper describes an extension 
to an algorithm that operates over CF gram- 
mar to make it applicable to feature-theoretic 
ones. One advantage of the extended algo- 
rithm is that it preserves as much of the in- 
formation in the grammar as possible. 



In order to make more efficient parsers, it is 
sometimes necessary to preprocess (compile) a 
grammar to extract from it top-down informa- 
tion to guide the search during analysis. The 
first step in the preprocessing stage of sev- 
eral compilation algorithms requires the solu- 
tion of two functions normally called FIRST 
and FOLLOW. Intuitively, FIRST{X) gives 
us the terminal symbols that may appear in 
initial position in substrings derived from cate- 
gory X. FOLLOW{X) gives us the terminals 
which may immediately follow a substring of 
category X. For example, in the grammar S 
^ NP VP; NP ^ det noun; VP -^ vtra NP, 
we get: 



FIRST{S) = FIRST{NP) = {det}, 
FIRST{VP) = {vtra}, 
FOLLOW{NP) = {vtra,%}, 
FOLLOW{S) = FOLLOW{VP) -- 
marks end of input) 



{$} ($ 



These two functions are important in a large 
range of algorithms used for constructing ef- 
ficient parsers. For example the LR-parser 
construction algorithm given in Aho et al. 
(1986:232) uses FIRST to compute item clo- 
sure values. Another example is the compu- 
tation of the Z* relation which is used in the 
construction of generalized left-corner parsers, 
Nederhof (1993); this relation is effectively an 
extension of the function FIRST. 



2 Computing FIRST and 
FOLLOW 

We propose an algorithm for the computa- 
tion of FIRST values which handles feature- 
theoretic grammars without having to extract 
a CF backbone from them; the approach is eas- 
ily adapted to compute FOLLOW values too. 
An improvement to the algorithm is presented 
towards the end of the paper. Before describ- 
ing the algorithm, we give a well known proce- 
dure for computing FIRST for CF grammars 
(taken from Aho et al. (1986:189), where e is 
the empty string): 

"To compute FIRST{X) for all grammar sym- 
bols X, apply the following rules until no more 
terminals or e can be added to any FIRST set. 

1. If X is terminal, then FIRST{X) is X. 

2. If X —7- e is a production, then add e to 
FIRST {X). 

3. If X is nonterminal and X — ?■ YiY2...Yk is 
a production, then place a in FIRST{X) if 
for some i, a is in FIRST (Yi), and e is in 
all of FIRST (Yi) ... FIRST{Y,_i); that is, 
Yi...Y,_i ^ e. If e is in FIRST{Yj) for all 
j = 1, 2,..., k, then add e to FIRST{X). 

Now, we can compute FIRST for any string Xi 
X2...Xn as follows. Add to FIRST{XiX2...Xn) 
all of the non-e symbols of FIRST (Xi). Also 
add the non-e symbols of FIRST{X2) if e is in 
FIRST {Xi), the non-e symbols of FIRST {X^) if 
e is in both FIRST {Xi) and FIRST {X2), and so 
on. Finally, add e to FIRST{XiX2...Xn) if, for 
all i, FIRST{Xi) contains e." 

This algorithm will form the basis of our pro- 
posal. 

3 Compiling Feature- 
Theoretic Grammars 

3.1 Equivalence Classes 

The main reason why the above algorithm can- 
not be used with feature-theoretic grammars is 
that in general the number of possible nonter- 
minals allowed by the grammar is infinite. One 



of the simplest ways of showing this is where 
a grammar accumulates the orthographic rep- 
resentation of its terminals as one of its fea- 
ture values. It is not difficult to see how one 
can have an infinite number of NPs in such a 
grammar: 

NP[orth: the dog] 
NP[orth: the fat dog] 
NP[orth: the big fat dog], etc. 

This means that F I RST {N P[oit\i: the dog]) 
would have a different value to FIRST[NP[ 
orth: the fat dog]) even though they share 
the same leftmost terminal. That is, the fea- 
ture structure for the substring "det adj noun" 
will be different to that for "det noun" even 
though they have the same starting symbol. 
This point is important since similar situations 
arise with the subcategorization frame of verbs 
and the semantic value of categories in contem- 
porary theories of grammar. Pollard and Sag 
(1987). Without modification, the algorithm 
above would not terminate. 

The solution to this problem is to define a 
finite number of equivalence classes into which 
the infinite number of nonterminals may be 
sorted. These classes may be established in 
a number of ways; the one we have adopted is 
that presented by Harrison and FUison (1992) 
which builds on the work of Shieber (1985): it 
introduces the notion of a negative restrictor 
to define equivalence classes. In this solution 
a predefined portion of a category (a specific 
set of paths) is discarded when determining 
whether a category belongs to an equivalence 
class or not. For instance, in the above ex- 
ample we could define the negative restrictor 
to be {orth}. Applying this negative restrictor 
to each of the three NPs above would discard 
the information in the 'orth' feature to give us 
three equivalent nonterminals. It is clear that 
the restrictor must be such that it discards fea- 
tures which in one way or another give rise to 
an infinite number of nonterminals. Unfortu- 
nately, termination is not guaranteed for all 
restrictors, and furthermore, the best restric- 
tor cannot be chosen automatically since it de- 
pends on the amount of grammatical informa- 
tion that is to be preserved. Thus, selection 



of an appropriate restrictor will depend on the 
particular grammar or system used. 

3.2 Value Sharing 

Another problem with the algorithm above is 
that reentrancies between a category and its 
FIRST and FOLLOW values are not preserved 
in the solution to these functions; this is be- 
cause the algorithm assumes atomic symbols 
and these cannot encode explicitly shared in- 
formation between categories. For example, 
consider the following naive grammar: 



VP[agr: X] 
NP[agr: X] 



NP[agr: X] VP[agr: X] 
Vint[agr: X] 
Det N[agr: X] 



We would hke the solution of FOLLOW{N) 
to include the binding of the 'agr' feature 
such that the value of FOLLOW resembled: 
FOLLOW{N[agr : X]) = Vint[agr : X]. But 
the algorithm above, even with a restrictor, 
would not preserve such a binding since the 
addition of a new category to FOLLOW{N) 
is done independently of the bindings between 
the new category and N . 

4 The Basic Algorithm 

We propose an algorithm which, rather than 
construct a set of categories as the value of 
FIRST and FOLLOW, constructs a set of pairs 
each of which represents a category and its 
FIRST or FOLLOW category, with all the cor- 
rect bindings explicitly encoded. For instance, 
for the above example, the pair (VP[agr: X], 
Vint [agr: X]) would be in the set representing 
the value of the function FIRST. In the next 
section the algorithm for computing FIRST is 
described; computation of FOLLOW proceeds 
in a similar fashion. 

4.1 Solving FIRST 

When modifying the algorithm of Section 2 
we note that each occurrence of a category in 
the grammar is potentially distinct from ev- 
ery other category. In addition, for each cate- 
gory we need to remember all the reentrancies 



between it and the daughters within the rule 
in which it occurs. Finally, we assume that 
any category in a rule which can unify with 
a lexical category is marked in some way, say 
by using the feature- value pair 'ter: -\-\ and 
that non-terminal categories must unify with 
the mother of some rule in the grammar; the 
latter condition is necessary because the algo- 
rithm only computes the solution of FIRST for 
lexical categories or for categories that occur as 
mothers. 

In computing FIRST we iterate over all the 
rules in the grammar, treating the mother of 
each rule as the category for which we are try- 
ing to find a FIRST value. Throughout each 
iteration, unification of a daughter with the Ihs 
of an element of FIRST results in a modified 
rule and a modified pair in which bindings be- 
tween the mother category and the rhs of the 
pair are established. The modified mother and 
rhs are then used to construct the pair which 
is added to FIRST. For instance, given rule 
X ^ Y and pair (L, i?), we unify Y and L to 
give X' — > Y' and (L', i?'); from these the pair 
(X', R') is constructed and added to FIRST. 

The algorithm assumes an operation -|-< 
which constructs a set S' = S -\-< p in the fol- 
lowing way: if pair p subsumes an element a 
of S then S' = S - a -\- p; if p IS subsumed 
by an element of S then S' = S; else S' = S 
-\- p. It should be noted that the pairs con- 
stituting the value of FIRST can themselves 
be compared using the subsumption relation in 
which reentrant values are subsumed by non- 
reentrant ones, and combined using the unifi- 
cation operation. Thus in the principal step 
of the algorithm, a new pair is constructed as 
described above, a restrictor is applied to it, 
and the resulting, restricted pair is -|-<-added 
to FIRST. The algorithm is as follows: 

1. Initialise First = {}. 

2. Run through all the daughters in the 
grammar. If X is pre-terminal, then 
First = First +< {X,X)\^ (where 
(X, X)!$ means apply the negative re- 
strictor $ to the pair (X^X)). 

3. For each rule in the grammar with mother 



s 
s 

VP[agr: X, slash: Y] 
NP[agr: X, slash: NULL] 
NP [slash: NP] 



NP[agr: X, slash: NULL] VP[agr: X, slash: NULL] 

NP[slash: NULL] NP[agr: X, slash: NULL] VP[agr: X, slash: NP] 

Vtra[agr: X, ter: +] NP[slash: Y] 

Det[ter: +] N[agr: X, ter: +] 

e 



Figure 1: Example grammar with value sharing. 



X, apply steps 4 and 5 until no more 
changes are made to First. 



4. If the rule is X 
First +< (X, e)!$. 



e, then First 



5. If the rule is X — > Yi-.Y^.-Yk^ then First = 
First +< (X',a)!$ if (l^',a) has success- 
fully unified with an element of First, and 
(y/, ei)...(l^'_^, ei-i) have all successfully 
and simultaneously unified with members 
of First. Also, First = First+<{X',e)\^ 
if (y/, ei)...(y^', £/;) have all successfully 
and simultaneously unified with elements 
of First. 

6. Now, for any string of categories Xi 
..X,..Xn, First = First +<{X[...X^,a)\^ 
if (X(,a) has successfully unified with an 
element of First, and a ^ e. Also, for 
I = 2...n, First = First +<{X[...X'^,ay.^ 
if (X', a) has successfully unified with 
an element of First, a ^ e, and 
(X(, ei)...(X'_^, ej_i) have all successfully 
and simultaneously unified with members 
of First. Finally, First = First +< 
{X[...X'^,e)\<^ if {X[,e,)...{X'^,e^) hav"e 
all successfully and simultaneously unified 
with members of First. (This step may be 
computed on demand). 

One observation on this algorithm is in order. 
The last action of steps 5 and 6 adds e as a 
possible value of FIRST for a mother category 
or a string of categories; such a value results 
when all daughters or categories have e as their 
FIRST value. Since most grammatical descrip- 
tions assign a category to e (e.g. to bind onto it 
information necessary for correct gap thread- 
ing), the pairs [X',e) or [X[...X'^,e) should 
have bindings between their two elements; this 
creates the problem of deciding which of the 
es in the FIRST pairs to use, since it is possi- 
ble in principle that each of these will have 



a different value for e. In our implementa- 
tion, the pair added to First in these situa- 
tions consists of the mother category or the 
string of categories and the most general cate- 
gory for e as defined by the grammar, thus ef- 
fectively ignoring any bindings that e may have 
within the constructed pair. A more accurate 
solution would have been to compute multiple 
pairs with e, construct their least upper bound, 
and then add this to First. However, in our 
implementation this solution has not proven 
necessary. 



4.2 Example 

Assuming the grammar in Fig. 1 and the neg- 
ative restrictor $ = {slash}, the following is a 
simplified run through the algorithm: 

• First = {} 

• After processing all pre-terminal categories 
First = {{Det, Det), {N, N), {Vtra, Vtra)} 
(obvious bindings not shown). 

• After the first iteration First = {{Det, Det), 
{N,N),{Vtra,Vtra),{VP[agr : X],Vtra[agr : 
X]),{NP,Det),{NP,e)} 

• Since 'slash' is in <i>, any of the NPs in the 
grammar will unify with the Ihs of {NP, e) and 
hence S will have Vtra as part of its FIRST 
value. First = {..,{VP[agr : X],Vtra[agr : X]), 
{NP, Det),{NP, e), {S, Det), {S, Vtra)] 

• The next iteration adds nothing and the first 
stage of the algorithm terminates. 

The second stage (step 6) is done on demand, 
for example to compute state transitions for 
a parsing table, in order to avoid the expense 
of computing FIRST for all possible substrings 
of categories. For instance, to compute FIRST 
for the string [NP NP VP] the algorithm works 
as follows: 



• First = {..,{VP[agr : X],Vtra[agr : X]), 
{NP,Det),{NP,e)...} 

• After considering the first NP: First = 
{..,{[NP NP VP],Det)}. 

• Consideration of the second NP in the input 
string results in no changes to First, given the se- 
mantics of +<, since the pair that it would have 
added, {[NP NP VP],e), is already in First. 

• Since NPs can rewrite as e (i.e. {NP, e) 
is in First), First = {.., {[NP NP VP], Det), 
{[NP NP VP],Vtra)}. 

• Finally, {[NP NP VP], e) may not be added since 
{VP, e) does not unify with any element of First. 

5 Improving the Search 
Through First 

If the algorithm is run as presented, each it- 
eration through the grammar rules becomes 
slower and slower. The reason is that, in step 
5, when searching First to create a new pair 
{X',a), every pair in First is considered and 
unification of its Ihs with the relevant daughter 
of X attempted. Since each iteration normally 
adds pairs to First each iteration involves a 
search through a larger and larger set; fur- 
thermore, this search involves unification, and 
in the case of a successful match, the subse- 
quent construction and addition to First also 
requires subsumption checks. All of these op- 
erations combine to make each additional ele- 
ment in First have a strong effect on the per- 
formance of the algorithm. We therefore need 
to minimize the number of pairs searched. 

Considering the dependencies that exist be- 
tween pairs in First one notices that once a 
pair has been considered in relation with all 
the rules in the grammar, the effect of that 
pair has been completely determined. That is, 
after a pair is added to First it need only be 
considered up to and including the rule from 
which it was derived, after which time it may 
be excluded from further searches. For exam- 
ple, take the previous grammar, and in partic- 
ular the value of First after the first iteration 
through the algorithm. The pair {NP, Det), 
added because of the rule NP[agr: X, slash: 



NULL] => Det[ter: -|-] N[agr: X, ter: -|-], has to be 
considered only once by every rule in the gram- 
mar; after that, this pair cannot be involved in 
the construction of new values. 

A simple data structure which keeps track 
of those pairs that need to be searched at any 
one time was added to the algorithm; the data 
structure took the form of a list of pointers to 
active pairs in First, where an active pair is 
one which has not been considered by the rule 
from which it was constructed. For example, 
the pair {NP, Det) would be active for a com- 
plete iteration from the moment that the cor- 
responding rule introduced it until that rule is 
visited again during the second iteration. The 
effect of this policy is to allow each pair in 
First to be tested against each rule exactly 
once and then be excluded from subsequent 
searches; this greatly reduces the number of 
pairs considered for each iteration. 

Using the Typed Feature Structure system 
(the LKB) of Briscoe et al. (1993), we wrote 
two grammars and tested the algorithm on 
them. Table I shows the average number of 
pairs considered for each iteration compared 
to the average number of pairs in First. 





13 Rule Grammar 


21 Rule Grammar 




Considered 


Total 


Considered 


Total 


Iter. 1 


3.5 


3.5 


8.4 


8.4 


Iter. 2 


7.5 


10.7 


9.7 


18.7 


Iter. 3 


1.2 


12.0 


1.0 


19.0 



Table 1: Average number of pairs per iteration. 

As we can see, after the first iteration the 
number of pairs that needs to be considered 
is less (much less for the final iteration) than 
the total number of pairs in First. Similar im- 
provements in performance were obtained for 
the computation of FOLLOW. 



6 Related Research 

The extension to the LR algorithm presented 
by Nakazawa (1991) uses a similar approach 
to that described here; the functions involved 
however are those necessary for the construc- 
tion of an LR parsing table (i.e. the GOTO 
and ACTION functions). One technical dif- 



ference between the two approaches is that he 
uses positive restrictors (Shieber 1985) instead 
of negative ones. In addition, both of his algo- 
rithms also differ in another way from the al- 
gorithm described here. The difference is that 
they add items to a set using simple set addi- 
tion whereas in the algorithm of Section 4.1 we 
add elements using the operator -|-<. Further- 
more, when computing the closure of a set of 
items, both of the algorithms there ignore the 
effect that unification has on the categories in 
the rules. 

For example, the states of an LR parser are 
computed using the closure operation on a set 
/ of dotted rules or items. In Nakazawa's al- 
gorithms computation of this closure proceeds 
as follows: if dotted rule < A — > w.Bx > is 
in /, then add a dotted rule < C — > .y > to 
the closure of /, where C and B unify. This 
ignores the fact that both dotted rules may be 
modified after unification, and therefore, his 
algorithm leads to less restricted / values than 
those implicit in the grammar. To adapt our 
algorithm to the computation of the closure 
of / for a feature-theoretic grammar would in- 
volve using a set of pairs of dotted rules as the 
value of /. 



7 Conclusion 

We have extended an algorithm that manip- 
ulates CF grammars to allow it to handle 
feature-theoretic ones. It was shown how most 
of the information contained in the grammar 
rules may be preserved by using a set of pairs 
as the value of a function and by using the 
notion of subsumption to update this set. Al- 
though the algorithm has in fact been used to 
adapt the constraint propagation algorithm of 
Brew (1992) to phrase structure grammars, the 
basic idea should be applicable to the rest of 
the functions needed for constructing LR ta- 
bles. However, such adaptations are left as a 
topic for future research. 

Finally, improvements in speed obtained 
with the active pairs mechanism of Section 5 
are of an order of magnitude in an implemen- 
tation using Common Lisp. 
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